Search CORE

55 research outputs found

Optimization frameworks and sensitivity analysis of Stackelberg mean-field games

Author: Guo Xin
Hu Anran
Zhang Jiacheng
Publication venue
Publication date: 08/10/2022
Field of study

This paper proposes and studies a class of discrete-time finite-time-horizon Stackelberg mean-field games, with one leader and an infinite number of identical and indistinguishable followers. In this game, the objective of the leader is to maximize her reward considering the worst-case cost over all possible

\epsilon

-Nash equilibria among followers. A new analytical paradigm is established by showing the equivalence between this Stackelberg mean-field game and a minimax optimization problem. This optimization framework facilitates studying both analytically and numerically the set of Nash equilibria for the game; and leads to the sensitivity and the robustness analysis of the game value. In particular, when there is model uncertainty, the game value for the leader suffers non-vanishing sub-optimality as the perturbed model converges to the true model. In order to obtain a near-optimal solution, the leader needs to be more pessimistic with anticipation of model errors and adopts a relaxed version of the original Stackelberg game

arXiv.org e-Print Archive

Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Author: Basei Matteo
Guo Xin
Hu Anran
Publication venue
Publication date: 10/11/2020
Field of study

In this paper we study a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes

N

. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation, and establish a sublinear regret bound in the order of

\tilde{O}(\sqrt{N})

. The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property.Comment: 25 page

arXiv.org e-Print Archive

Mixed polymer hydrophilic matrices containing HPMC and PEO

Author: Hu Anran
Publication venue
Publication date
Field of study

The research in this thesis describes investigations of (i) mixed polymer HPMC and PEO hydrophilic matrices and their performance in low and high ionic environments and (ii) understanding the internal behaviour of HPMC and PEO mixed systems. It was postulated that using a blend of these polymers might provide advantages over the use of single polymers. A series of ‘realistic’ 30% w/w polymer matrix formulations, containing different weight ratios of HPMC and PEO and a soluble model drug (caffeine), were tested in ionically challenging media, up to 1M sodium chloride (NaCl). Dissolution testing showed how HPMC dominated formulations exhibited accelerated release in high ionic strength media (0.8M NaCl or higher), whereas PEO dominated formulations did not. Power law analyses suggested the release mechanism of matrices in 0.6M NaCl and below were anomalous non-Fickian transport, but case II transport was observed in HPMC dominated matrices at 0.8M NaCl and above. A polymer ratio of 4:6 HPMC:PEO allowed an extended release tablet to be formulated that was resistant to 1M NaCl. In 0.6M NaCl or below, increasing the proportion of HPMC in a mixed HPMC:PEO tablets, increased the duration of extended release. Confocal laser scanning microscopy was used to investigate the structure of the HPMC:PEO matrix hydrated gel layer. The results provided evidence that HPMC and PEO particles swell independently in the gel layer. They remained substantially unmixed during gel layer formation, and each appeared to contribute independently to gel layer structure. Magnetic resonance imaging showed how PEO matrices hydrated more rapidly than HPMC matrices, but PEO matrices completely dissolved after 9 hours. In the case of 4:6 HPMC:PEO and HPMC matrices, a hydrated gel remained. This reflected the behaviour of these matrices in the dissolution tests. Unfortunately, MRI could only be applied in zero salt media, as the dielectric properties of NaCl interfered with the results, and other techniques were required to examine matrix behaviour in high salt media. Texture analysis showed that at low NaCl concentrations, the HPMC gel layer exhibited higher gel strength than PEO, and that by substituting HPMC for PEO increased gel layer strength was obtained. The later stages of gel layer morphology were also investigated by digital optical macroscopy. Images showed greater gel longevity of HPMC and mixed matrices, with evidence for a higher gel strength and less erosion than PEO matrices. Swelling of single polymer particles showed how increasing NaCl concentration significantly inhibited HPMC particle swelling but only had a limited effect on PEO particle swelling. The ability of PEO particles to swell in high salt media may explain the resistance of PEO matrices to high NaCl dissolution media. The miscibility of HPMC and PEO in dilute solution was studied by rheology and phase contrast microscopy. Measurements of storage modulus (G’) at 1% w/v showed how most polymer mixtures showed negative deviations from ideal mixing at all oscillatory frequencies studied (0.1Hz, 1Hz, and 10Hz). This is evidence that these polymers are immiscible in solution. Phase contrast microscopy provided direct optical evidence of phase separation in blended HPMC:PEO solutions (4% w/v). The tendency of these polymers to be immiscible, suggests that they may also be phase separated in the more concentrated environment of the gel layer. Gel layer morphology in binary polymer tablets was investigated directly by confocal microscopy (up to 15 min) and by attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR) up to 3 hours. The confocal microscopy images showed that HPMC and PEO appeared to swell independently during early gel layer. Each polymer appeared to contribute independently to gel layer structure. ATR-FTIR imaging allowed chemical mapping of the three components (water, PEO and HPMC) in the gel layer, providing evidence that each polymer formed individual domains. PEO appears to be more extensively swollen than HPMC and may form the outer part of the gel layer, protecting HPMC from the effect of high ionic media. The work in this thesis suggests that mixed polymer HPMC:PEO matrices may have certain advantages over the use of matrices containing only single polymer. PEO confers resistance to highly ionic media, while HPMC provides a longer drug release than PEO alone. Each polymer appears to contribute separately to the gel layer, but the ability of PEO to swell in highly ionic environments, may allow formation of a diffusion barrier that protects the incorporated HPMC from ionic media, and allows it to contribute to gel layer structure

Nottingham ePrints

A General Framework for Learning Mean-Field Games

Author: Guo Xin
Hu Anran
Xu Renyuan
Zhang Junzi
Publication venue
Publication date: 10/10/2021
Field of study

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates that naively combining reinforcement learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multi-agent reinforcement learning in the

N

-player setting.Comment: 43 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1901.0958

arXiv.org e-Print Archive

MFGLib: A Library for Mean-Field Games

Author: Guo Xin
Hu Anran
Santamaria Matteo
Tajrobehkar Mahan
Zhang Junzi
Publication venue
Publication date: 17/04/2023
Field of study

Mean-field games (MFGs) are limiting models to approximate

N

-player games, with a number of applications. Despite the ever-growing numerical literature on computation of MFGs, there is no library that allows researchers and practitioners to easily create and solve their own MFG problems. The purpose of this document is to introduce MFGLib, an open-source Python library for solving general MFGs with a user-friendly and customizable interface. It serves as a handy tool for creating and analyzing generic MFG environments, along with embedded auto-tuners for all implemented algorithms. The package is distributed under the MIT license and the source code and documentation can be found at https://github.com/radar-research-lab/MFGLib/

arXiv.org e-Print Archive

Metabolome and Transcriptome Analyses Unravels Molecular Mechanisms of Leaf Color Variation by Anthocyanidin Biosynthesis in Acer triflorum

Author: Han Zhiming
Hu Xiaoqing
Pei Xiaona
Qu Guanzheng
Sun Anran
Tigabu Mulualem
Xie Ying
Zhang Shikai
Zhao Xiyang
Publication venue
Publication date: 01/01/2022
Field of study

Acer triflorum Komarov is an important ornamental tree, and its seasonal change in leaf color is the most striking feature. However, the quantifications of anthocyanin and the mechanisms of leaf color change in this species remain unknown. Here, the combined analysis of metabolome and transcriptome was performed on green, orange, and red leaves. In total, 27 anthocyanin metabolites were detected and cyanidin 3-O-arabinoside, pelargonidin 3-O-glucoside, and peonidin 3-O-gluside were significantly correlated with the color development. Several structural genes in the anthocyanin biosynthesis process, such as chalcone synthase (CHS), flavanone 3-hydroxylase (F3H), and dihydroflavonol 4-reductase (DFR), were highly expressed in red leaves compared to green leaves. Most regulators (MYB, bHLH, and other classes of transcription factors) were also upregulated in red and orange leaves. In addition, 14 AtrMYBs including AtrMYB68, AtrMYB74, and AtrMYB35 showed strong interactions with the genes involved in anthocyanin biosynthesis, and, thus, could be further considered the hub regulators. The findings will facilitate genetic modification or selection for further improvement in ornamental qualities of A. triflorum

Epsilon Open Archive

Using the concept of preperitoneal membrane anatomy in total extraperitoneal prosthesis: a preliminary report

Author: Anran Hu
Huabin Zheng
Jinbo Fu
Penghao Kuang
Rongliang Qiu
Suqiong Lin
Xiaoquan Hong
Yilong Fu
Publication venue: 'Frontiers Media SA'
Publication date: 01/06/2023
Field of study

PurposeTotal extraperitoneal prosthesis (TEP) is one of the most commonly used laparoscopic inguinal hernia repair procedures. This work aims to report the application of membrane anatomy to TEP and its value in intraoperative space expansion.MethodsThe clinical data of 105 patients, from January 2018 to May 2020, with inguinal hernia who were treated with TEP (58 patients in the General Department of the Second Hospital of Sanming City, Fujian Province, and 47 patients in the General Department of the Zhongshan Hospital Affiliated to Xiamen University) were retrospectively analyzed.ResultsAll surgeries were successfully completed under the guidance of the concept of preperitoneal membrane anatomy. The operation time was 27.5 ± 9.0 min, blood loss was 5.2 ± 0.8 ml, and the peritoneum was damaged in six cases. The postoperative hospital stay was 1.5 ± 0.6 days, and five cases of postoperative seroma occurred, all self-absorbed. During the follow-up period of 7–59 months, there was no case of chronic pain and recurrence.ConclusionThe membrane anatomy at the correct level is the premise of a bloodless operation to expand the space while protecting adjacent tissues and organs to avoid complications

Directory of Open Access Journals

Recommended from our members

Learning in Mean-Field Games and Continuous-Time Stochastic Control Problems

Author: Hu Anran
Publication venue: eScholarship, University of California
Publication date: 01/01/2022
Field of study

In recent years, there has been an ever-increasing demand for building reliable and versatile agents in applications arising from numerous fields including autonomous driving, supply chain, manufacturing, e-commerce and finance. To meet these challenging demands, researches in decision making systems have drawn upon a wide range of tools from applied probability, reinforcement learning (RL), stochastic control and game theory. This dissertation focuses on developing new methodologies and efficient algorithms with provable performance guarantees to deal with complex environments such as large population competitions and continuous-time systems. The first part of this dissertation focuses on designing and analyzing RL algorithms for large population games. Large population games have appeared in many real-world problems. Examples include massive multiplayer online role-playing games, high frequency trading, and the sharing economy. However, in general, it becomes increasingly difficult to solve such problems as the number of players in the game grows. Mean field game (MFG) provides an ingenious and tractable aggregation approach to approximate the otherwise challenging N-player stochastic games. In Chapter 1, we present a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates that naively combining reinforcement learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multi-agent reinforcement learning in the

N

-player setting.The second part of this dissertation focuses on designing and analyzing RL algorithms for continuous-time stochastic dynamical systems. As most physical systems in science and engineering evolve continuously in time, many real-world control tasks, such as those in aerospace, automotive industry and robotics, are naturally formulated in terms ofcontinuous-time dynamical systems. Nevertheless, the mainstream RL algorithms have been designed for discrete-time systems, despite that they are widely applied to physical tasks in continuous-time systems. Continuous-time RL algorithms have also been developed in the past decades. But the theoretical guarantees of these works are limited to the asymptotic convergence and the non-asymptotic guarantees remain unknown. In Chapter 2, we take the first step towards designing algorithms with non-asymptotic guarantees for solving finite-time-horizon continuous-time linear quadratic (LQ) RL problems in an episodic setting, where both the state and control coefficients are unknown to the controller. We first propose a least-squares algorithm based on continuous-time observations and controls, and establish a logarithmic regret bound of magnitude

O((\ln M)(\ln\ln M))

, with

M

being the number of learning episodes. The analysis consists of two components: perturbation analysis, which exploits the regularity and robustness of the associated Riccati differential equation; and parameter estimation error, which relies on sub-exponential properties of continuous-time least-squares estimators. We further propose a practically implementable least-squares algorithm based on discrete-time observations and piecewise constant controls, which achieves similar logarithmic regret with an additional term depending explicitly on the time stepsizes used in the algorithm. In Chapter 3, we extend the results beyond linear-quadratic problems, where the unknown linear jump-diffusion process is controlled subject to non-smooth convex costs. We show that the associated linear-convex (LC) control problems admit Lipchitz continuous optimal feedback controls and further prove the Lipschitz stability of the feedback controls. The analysis relies on a stability analysis of the associated forward-backward stochastic differential equation. We then propose a least-squares algorithm which achieves a regret of the order

O(\sqrt{N\ln N})

on linear-convex learning problems with jumps, where

N

is the number of learning episodes; the analysis leverages the Lipschitz stability of feedback controls and concentration properties of sub-Weibull random variables. Numerical experiment confirms the convergence and the robustness of the proposed algorithm

eScholarship - University of California